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Abstract — We study two-player security games which can 
be viewed as sequences of nonzero-sum matrix games played 
by an Attacker and a Defender. The evolution of the game is 
based on a stochastic fictitious play process. Players do not 
have access to each other's payoff matrix. Each has to observe 
the other's actions up to present and plays the action generated 
based on the best response to these observations. However, when 
the game is played over a communication network, there are 
several practical issues that need to be taken into account: 
First, the players may make random decision errors from 
time to time. Second, the players' observations of each other's 
previous actions may be incorrect. The players will try to 
compensate for these errors based on the information they 
have. We examine convergence property of the game in such 
scenarios, and establish convergence to the equilibrium point 
under some mild assumptions when both players are restricted 
to two actions. 

I. Introduction 

Game theory has recently been used as an effective tool 
to model and solve many security problems in computer 
and communication networks. In a noncooperative matrix 
game between an Attacker and a Defender, if the payoff 
matrices are assumed to be known to both players, each 
player can compute the set of Nash equilibria of the game 
and play one of these strategies to maximize her expected 
gain (or minimize its expected lossfl However, in practice, 
the players do not necessarily have full knowledge of each 
other's payoff function. If the game is repeated, a mechanism 
called fictitious play (FP) can be used for each player to learn 
her opponent's motivations. In a FP process, each player 
observes all the actions and makes estimates of the mixed 
strategy of her opponent. At each stage, she updates this 
estimate and plays the pure strategy that is the best response 
(or generated based on the best response) to the current 
estimate of the other's mixed strategy. It can be seen that in a 
FP process, if one person plays a fixed strategy (either of the 
pure or mixed type), the other person's strategy will converge 
to the best response to this fixed strategy. Furthermore, it 
has been shown that, for many classes of games, such a FP 
process will finally render both players playing the Nash 
equilibrium. 
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1 The problem of each player choosing a Nash equilibrium out of multiple 
Nash equilibria is not discussed within the scope of this paper. 



In this paper, we examine a two-player game, where 
an Attacker (denoted as player 1 or Pi) and a Defender 
(denoted as player 2 or P2) participate in a discrete-time 
repeated nonzero-sum matrix game. In a general setting, the 
Attacker has m possible actions and the Defender has n 
posssible actions to choose from. When such a security game 
is played between two automated systems over a network, 
in order to have a good model, we have to take into account 
several practical issues. First, the players may make random 
decision errors from time to time. Instead of playing an 
action aj that is the output of the best-response computation, 
player i may play another action a\ with some probability 
(which is typically small for functional systems). Second, 
the observation that each player makes on her opponent's 
actions may also be incorrect, which will definitely affect 
her own responding actions. There are many factors giving 
rise to these problems: The non-idealiality of electronic and 
software systems, the uncertain and noisy characteristic of 
observation data, and the erroneous nature of the channels 
on which commands and observations are communicated, to 
name a few. 

It is these scenarios that we aim to address in this 
paper. We examine convergence of players' strategies in the 
FP process with decision and observation errors. If these 
strategies do converge, we quantify the new Nash equilibrium 
and thus estimate how these decision and observation errors 
affect the learning process and the equilibrium of the game. 

Security games have been examined extensively in a 
number of papers, see for example, [l]-[4]. The work in 
[5] employs the framework of Bayesian games to address 
the intrusion detection problem in wireless ad hoc networks. 
In [6], the author examines the intrusion detection problem 
in heterogenous networks as a nonzero-sum static game. The 
work in [7] addresses this problem using the framework of 
zero-sum stochastic games [8]. In [9], we develop a network 
model based on linear influence networks that allows us to 
take into consideration the correlation among the nodes in 
terms of both security assets and vulnerabilities. 

Relevant literature on fictitious play can be found in [10]- 
[16]. For two-player zero-sum classical FP, the convergence 
proof was obtained for arbitrary numbers of actions for each 
player (to x n) [10]. For nonzero-sum games, the proofs for 
two-player FP have been found for the case where one player 
is restricted to two actions (See [12] for classical FP and [13] 
for stochastic FP). In [19], we address the classical FP and 
stochastic FP with imperfect observations for the case where 
each player is restricted to two actions. 

Our contributions in this paper are as follows. First, we 
formulate the repeated security games where players make 



random decision errors as a fictitious play process. We 
discuss the convergence of such games in the general case 
with arbitrary numbers of actions for each player. We then 
establish the convergence property for several classes of 
games with decision errors where both players are restricted 
to two actions. Second, we examine the fictitious play 
process where the players' observations are imperfect and 
the players try to compensate for the observation errors. We 
again establish the convergence property for the case where 
both players are restricted to two actions. We point out a 
number of scenarios that can be considered as special cases 
of this result. 

In Section [TT] we introduce some background and notation 
adopted from [13], [14]. The analysis for the stochastic FP 
with decision errors is presented in Section [III] In Section 
IIVI we address the FP with observation errors. Finally, some 
concluding remarks end the paper. 

II. Background 

A. Static games 

We present an overview of some concepts for static 
security games, where player Pi has to and player P 2 has n 
possible actions. In equations written for the generic player 
Pi, i — 1,2, we use k to denote to or n. Denote by 
pi £ A(m) and p2 G A(n) a pair of mixed strategies for Pi 
and P2, respectively, where A(fc) is the simplex in $l k , i.e., 

A(k) = ^se$l k \s J >0,j = l,...,k, XJs,=lj. (1) 
The utility function of Pi, Ui(pi 1 p-i), is given by0 

Ui(pi,p-i) = p'.M.p , + nH(pi), (2) 

where Mj is the payoff matrix of Pj,i = 1,2, and H : 
Int(A(k)) — ^ 72. is the entropy function of the probability 
vector pi: H(pi) = —pflog(pi) (Note that Mi is of 
dimension to x n and M2 n x to). The weighted entropy 
TiH(pi) with Ti > is introduced to boost mixed strategies. 
In a security game, Ti represents how much player i wants 
to randomize its actions, and thus is not necessarily known 
to the other player. Also, for n = r 2 = (referred to as 
classical FP), the best response mapping can be set-valued, 
while it has a unique value when Tj > (referred to as 
stochastic FP) [4] [14]. For a static game, each player selects 
an integer action <Zj according to the mixed strategy pi. The 
(instant) payoff for player Pj is u[.MiV a _ i +TiH(pi), where 
we use Vj,j = 1, . . . ,k, to indicate the jth vertex of the 
simplex A(fc) (For example, when k = 2, Vi = [1 0] T for 
the first action, and V2 = [0 1] T for the second action). For 
a pair of mixed strategies (pi,p2), the utility functions are 
given by the expected payoffs: 

Ui( Pi ,p-i) = E [vlMiV a _ % ] + TiH( Pi ). (3) 

2 As standard in the game theory literature, the index —i is used to indicate 
those of other players, or the opponent's in this case. 



Now, the best response mappings fl\ : A(n) — > A (to) and 
f3 2 : A(to) — > A(n) are defined as: 

I3i(p-i)=&ig max Ui(pi,p-i). (4) 

PiSA(fc) 

If Ti > 0, from dU, the best response is unique as mentioned 
earlier, and is given by the soft-max function: 

fkM=*(^), (5) 

where the soft-max function a : 5R fe — > Interior (A (A;)) is 
defined as 

(a(x))j = — £ —,j = l,...,k. (6) 

Note that (cr(x))j > 0, and thus the range of the soft-max 
function is just the interior of the simplex. 

Finally, a (mixed strategy) Nash equilibrium is defined to 
be a pair (p*,^) G A (to) x A(n) such that for all pi G 
A(m) and p 2 £ A(n) 

Ui(pi,pU) < Ui{p*,pU). (7) 

We can also write a Nash equilibrium (p*,^) as the fixed 
point of the best response mappings: 

P*=Pi(pU). (8) 

B. Fictitious play 

1) Discrete-Time Fictitious Play: From the static game 
described in Subsection IH-AI we define discrete-time FP as 
follows. Suppose that the game is repeated at times k s 
{0, 1,2,.. .}. The empirical frequency qi(k) of player P, is 
given by 

k 

*(fc + l) = 7^ T E^0) (9) 

3=0 

Using induction, we can prove the following recursive rela- 
tion: 

9i(fc + 1) = fcTT* (fc) + fcTT WQ * (fc) - (10) 

At time k, player P,; picks the best response to the empirical 
frequency of the opponent's actions: 

Pi(k) — Pi{q-i(k)). (11) 

2) Continuous-Time Fictitious Play: From the equations 
of discrete-time FP (0, ([Tol l, the continuous-time version of 
the iteration can be stated as follows ( [13], [14], also see 
[15], [19] for the derivation): 

Pi {t) = Pi(p-i(t))- Pi (t), » = 1,2. (12) 

C. Algorithms 

We present in this subsection two algorithms for discrete- 
time stochastic FP. Algori thm lH-C.il derived from [13], [14], 
[19], is used for the case when players' observations are 
considered to be perfect or when they have no estimates of 
observation errors. Algorithm III-C.2I a generalized version 
of the one in [19], is used for players who have estimates of 
observation errors and want to compensate for these errors. 



1 ) Stochastic FP with perfect observations: In stochastic 
FR at time k, player i,i = 1,2, carries out the following 
steps: 

1) Update the empirical frequency of the opponent using 
COll. 

2) Compute the best response f3i(q_i(k)) using ((5]). (Note 
that the result is always a completely mixed strategy.) 

3) Generate an action dj(fc) using the mixed strategy 
from step (2), a,i{k) — rand\fii(q-i(k))], where 
we use rand to denote the randomizer function that 
gives a,i(k) such that the expectation E [ai(k)] = 

&(«-<(*))• 

2) Stochastic FP with imperfect observations: At time k, 
player i, i = 1, 2, carries out the following steps: 

1) Update the observed frequency of the opponent g_ s 
using dTob . 

2) Compute the estimated frequency 



Di and D 2 are given below. 



Q-i = /-» (<?-<)• 



(13) 



3) Compute the best response f3i(q_i(k)) using (|5). (Note 
that the result is always a completely mixed strategy.) 

4) Generate an action ai(k) using the mixed strategy from 
step (3), <n(k) = randlfti(q- t (k))}. 



D. A convergence result for m — 
observations 



= 2 with perfect 



We restate the following theorem from [13], [19], for the 
general case where the coefficients of the entropy terms 
for the players (t\ and t 2 ) are not necessarily equal (Cf. 
Equation (01). This theorem in [13] is stated for t\ = r 2 , 
however, one can always scale the payoff matrices to get the 
general case. 

Theorem 1: (A variant of Theorem 3.2 [13] for general 
T i, T 2 > 0.) Consider a two-player two-action fictitious play 
process with (L T M 1 L)(L T M 2 L) ^ 0, where M 4 are the 
payoff matrices of Pi, i = 1,2 and L := (1, — 1) T . The 
solutions of continuous-time FP ( fT2b satisfy 



lim ( P i(t) - f3 1 (p 2 (t))) = 
lim ( P2 (t) - (3 2 ( P i(t))) = 0, 

t— J- OO 



(14) 
(15) 



where (3i(p_i), i — 1,2, are given in (0. 



III. Security games with decision errors 

In this section, we consider the situations where players 
are not totally rational or the channels carrying commands 
are error prone. Specifically, Pi makes decision errors with 
probabilities ct^'s where a^-, i,j — 1 . . .m, is the proba- 
bility that Pi intends to play action i but ends up playing 
action j, a%j > 0, SjLi a ij — 1; * = l...m. Similarly, 
P2's decision error probabilities are given by e^ , eij > 0, 
y~lj—i £ij = 1, i = 1 ... 11. This is called "trembling hand" 
problem in the game theory literature (See for example, Ref- 
erence [17], Subsection 3.5.5). The decision error matrices 
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(16) 
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When m 
written as: 

Di -- 



= n = 2, the decision error matrices can be 



1 - 



7 
1--, 



Do 
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I -PL 



(18) 

The decision errors of each player in this case are illustrated 
in Figure Q] In what follows, we state two standard results 
in digital communications. The proofs are similar to those 
for the case m = n = 2 in [19]. 

Proposition 1: Consider the two-player discrete-time fic- 
titious play with decision errors where the error probabilities 
are given in Equations ([Tol l and ( 117) . Let onj , i, j — 1 . . . m, 
and £jj, i,j = l...n, be the empirical decision error 
frequencies of Pi and P 2 , respectively. If decision errors 
are assumed to be independent from stage to stage, it holds 
that 



lim a.s. a 

k— >oo 

lim a.s. e 

k— ¥ 00 



a^, i,j = 1 



i,j = l...n. 



(19) 



where we use lim a.s. to denote almost sure convergence. 

Proposition 2: Consider a two-player discrete-time ficti- 
tious play with decision errors where the error probabilities 
are given in Equations ( [Tol l and Let be the empirical 
frequency of player i's real actions and qi be the frequency 
of player i's intended actions (generated from the best 
response at each stage). If decision errors are assumed to 
be independent from stage to stage, it holds that 



lim a.s. qi = Di{ lim a.s. qi 

k— >oo k— >oo 



1,2, 



(20) 



where Di are the decision error matrices given in Equations 
(HUi and ([TTli. 

A. If the players know their own decision error probabilities 

We first consider the case where the players both have 
complete information about the decision error matrices Di, 
i = 1, 2. If they both also know the payoff matrices Mi, i 
1 , 2, then each can compute and play one of the Nash 
equilibria right from beginning. The problem then can be 
considered as a stochastic version of the trembling hand 
problem. Specifically, suppose that each player still wants 
to randomize their empirical frequency p { (instead of the 
frequency of their intended actions, or intended frequency, 
Pi) by including an entropy term in their utility function, we 
have that 



Ui(pi,p-i) =p[M l p- i + rii/(Ap0, i = 1,2, 



(21) 
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Fig. 1. The case m = n = 2 where players make decision errors with 
probabilities a, 7, e, and /i. 



where p^s are intended frequencies, Mj = D\M\D 2 and 
M2 = D 2 M 2 Di (These are the payoff matrices resulted 
from decision errors using the results in Propositions[T]and[2] 
see for example [17] for derivation). Using p i := Dipi, i = 
1, 2, the utility functions now can be written as 



Ui{pi,p-i) = pfMip_i + nH{Pi), i = 1,2. 



(22) 



The game is thus reduced to the one without decision errors 
and the Nash Equilibrium of the static game is known from 
Subsection III-AI to satisfy: 



Pi 



Pi&Lt), i = l,2, 



(23) 



or equivalently (with the assumption that £Vs are invertible): 

Pi = {Pi)~ X Pi{P-iP-i), i = l,2. (24) 



The best response is now given as 



Pi 



M t p_ 



(25) 



In the corresponding FP process (the "trembling hand 
stochastic FP"), as each player p can observe her oppo- 
nent's empirical frequency p_ i , she does not need to know 
D-i to compute the best response. We thus state below a 
convergence result for the FP process with decision errors 
for the case m = n = 2. 

Proposition 3: Consider a two-player two-action fictitious 
play process where players make decision errors with in- 
vertible decision error matrices D\ and D 2 , respectively. 
Suppose that at each step, each player calculates the best 
response taking into account their own decision errors using 
Equation ([25). If {L T MiL)(Z T M 2 L) ^ 0, L := (1, -1) T , 
the solutions of the continuous-time FP process with decision 
errors will satisfy 

' MiD 2 tim-t^oo p 2 (ty 



lim pift) = a 

t— >oo 



lim p2(£) = D 2 <t , 
t— ¥00 " \ 



_! / M 2 Di lim^oo pi{t) 



(26) 



Proof: The proof can be obtained using Theorem[T]and 
the fact p i :— DiPi, i — 1,2. ■ 
It thus can be seen that with knowledge of their own 
decision errors, players can completely precompensate for 
these errors and the equilibrium empirical frequencies remain 
the same as those of the original game without decision 
errors. 

B. If the players are unaware of all the decision error 
probabilities 

We consider in this subsection a two-player fictitious 
play process with decision errors where the decision error 
probabilities are not known to both players. Each player 
plays the regular stochastic FP Algorithm III-C.ll We are 
interested in whether or not the FP process will converge, 
and when it does, what the equilibrium will be. We first 
examine the general case with arbitrary m, n, and then the 
special case where m = n = 2. We first use Proposition [2] 
and the same arguments as in the proof of Theorem 3 [19] to 
approximate the discrete-time FP with the continuous-time 
version. At time step k, as each player Pj generates her action 
v Q .( fe ) based on the best response to her opponent's empirical 
frequency q_ i , the expectation of w a< (fc)) i = 1,2, will be 
given by 

E[v ai(k) ] = Diftfoffc)), 
E[v a2{k) ] - D 2 #,(5i(fc)), 

where D\ and D 2 account for decision errors. The mean 
dynamic of the empirical frequencies then can be written as 
follows 



q 2 (k + l) 



1 



12 ( k ) 



1 



Daftfot*)). (27) 



From the mean dynamic, we can derive the continuous-time 
approximation (See [20] for the derivation): 



Pi(t) = Diftfo(t))-fr(t), 
P 2 (t) = D 2 p 2 {p 1 {t)) ~p 2 {t). 



(28) 



where er(.) is the soft-max function defined in ©. 



It can be seen that a pair of mixed strategies (pl,p 2 ) that 
satisfies 

p*i(t) = D^!(p;(t)), 

p*(t) = D 2 p 2 {pl{t)). 

will be an equilibrium point of the dynamics d28l >. For 
some results on the stability of the equilibrium point in 
the continuous-time system and the discrete-time system for 
general values of m and n, we refer to [20]. When m — 
n = 2, it turns out the point (pl,p 2 ) is globally stable for 
the continuous-time system under some mild assumptions. 
We thus state the following theorem for this special case. 

Theorem 2: Consider a two-player two-action fictitious 
play process where players make decision errors with de- 
cision error matrices D\ and D 2 , respectively. Suppose that 
the players are unaware of all the decision error probabilities 



and use the regular stochastic FP algorithm |II-C.l| If P;, i = 
1,2, are invertible and {L T M 1 D 2 L){L T M 2 D 1 L) ^ 0, the 
solutions of continuous-time FP process with decision errors 
(l28T l will satisfy 
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(29) 



where cr(.) is the soft-max function defined in ©. 

Proof: The proof, some remarks, and a numerical 
example can be found in [20]. ■ 

IV. Security games with observation errors 

In [19], we study the effect of observation errors on 
convergence to the NE in a 2 x 2 FP process. We also prove 
that if each player has a correct estimate of error probabilities 
of observations, they can reverse the effect of the channel to 
obtain the NE of the original static game. In this section, 
we present a generalized version of these results. Consider a 
two-player fictitious play game with imperfect observations 
where the error channels are given in Equations ( f30b and 

(ED- 
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(30) 



(31) 



where ay, i,j = 1 . 



to is the probability that Pi's action 
i is erroneously observed as action j, ctij > 0, Xy=i a ij = 
1, i = 1 . . . m, and ey , i,j = l...n is the probability that 
P2's action i is erroneously observed as action j, €ij > 0, 
Xy=i e ij = 1? i = 1 • • • n - Suppose that the players have 
their estimates of the errror probabilities as follows: 
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(33) 



where ay > 0, 



1. 



1 ... rn, and dj > 0, 



=1 ' 

S^Li ^ij — 1' * = We first restate Propositions 

Q] and |2 in the context of repeated games with imperfect 
observations. 

Proposition 4: Consider the two-player discrete-time fic- 
titious play with imperfect observations where error proba- 



bilities are given in Equations ( 130b and (1311 1. Let 
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Fig. 2. Players observe their opponent's actions through binary channels 
with error probabilities a, 7, e, and fi. 



frequencies of observations on Pi's and P2's actions, re- 
spectively. If channel errors are assumed to be independent 
from stage to stage, it holds that 



lim a.s. a 
lim a.s. e 

k— >oo 



a^, i,j = 1 . . .m, 



i,j =l...n. 



(34) 



where we use lim a.s. to denote almost sure convergence. 

Proposition 5: Consider the two-player discrete-time fic- 
titious play with imperfect observations where error prob- 
abilities are given in Equations (f30b and OTI ). Let q t be 
the observed frequency and qi be the empirical frequency 
of player i. If channel errors are assumed to be independent 
from stage to stage, it holds that 

lim a.s. = Ci( lim a.s. qi), i = 1,2, (35) 

k— >oo k— >oc 

where Q are the channel error matrices given in Equations 
CP and (E). 

If both players have their estimates of the errror probabili- 
ties as in Equations ( l32l and d33l ). they can play the stochastic 
FP algorithm given in III-C.2I with = (Ci)~ 1 q_ i 

to compensate for observation errors (Using the results in 
Propositions |4] and 0. Again we can use the same procedure 
as in Subsection IIII-BI to approximate the discrete-time FP 
with the continuous-time version. 
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The continuous-time approximation is given by: 
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1 . . . n, be the empirical error 



pl(t) = cr 
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It can be seen that a pair of mixed strategies (gj , q\ ) that 
satisfies 

- 1 c 2P * 2 (ty 
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will be an equilibrium point of the dynamics 
some results on the stability of the equilibrium point in 
the continuous-time system and the discrete-time system 
for general values of m and n, we refer to [20]. When 
m = n = 2, again the point (p*,p 2 ) is globally stable for 
the continuous-time system under some mild assumptions. 
We have the following theorem. 

Theorem 3: Consider a two-player two-action fictitious 
play game with imperfect observations where the error 
channels are given in Figure [2] and Equation (f3Tb . 
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Suppose that the players have their estimates of the errror 
probabilities as follows: 
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The players then play the stochastic FP given in III-C.2I If 

(L T Mi(C 2 )- 1 C 2 L)( J L T M 2 (C ? i)- 1 CiL) ^ 0, the solutions 
of continuous-time FP with imperfect observations ( fT2b will 
satisfy 
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defined in ©. 



where cr(.) is the soft-max 
Proof: The proof, some remarks, and a numerical example 
can be found in [20]. ■ 

V. Conclusion 

In this paper, we have introduced and discussed some 
repeated security game models that take into account players' 
decision errors and observation errors. Each player does not 
have access to her opponent's payoff matrix and thus has 
to learn this through the fictitious play process. However, in 
a practical setting, each player is expected to make random 
decision errors from time to time and also has to respond 
to imperfectly observed actions of the other player. We have 
studied the convergence property of such games and, if the 
FP process does converge, quantified the new equilibrium. 
Such analyses will help provide guidelines for players to 
maximize their gain or minimize their loss in a nonideal 
environment. 

We normally start from the mean dynamics of the discrete- 
time version of a game, proceed to continuous-time approx- 
imation and then analyze convergence of this continuous- 
time version. Although the convergence of the continuous- 
time fictitious play does not guarantee the almost sure 



convergence of the discrete-time counterpart, it does provide 
the necessary limiting results for the discrete-time version. 
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